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High Level Problem Definition 


> Detect hosts infected with malware through observing 
their network communication. 
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Malware Command & Control 



Lifecycle of malware variants and control server domains over time 


y 



Defining the Problem - C&C Protocol Detection 


> Task: recognizing and attributing C&C communication on live 
networks. 

> Training experience: packet captures of labeled C&C 
communication. 

> Performance measurement: percentage of network communication 
correctly classified. 



DGA-Based Malware 
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Defining the Problem - DGA Detection 


> Task: recognizing and attributing sets of NXDomains to a DGA. 

> Training experience: labeled sets of NXDomains. 

> Performance measurement: percentage of NXDomains correctly 
classified. 
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Generalizing C&C Protocol Structure 


Request 1: 


Request 2: 

GET /Ym90bmq=/cnc.php?v=220&cc=IT 


GET /bWFsd2F=/cnc.php?v=139&cc=US 

Host: www.bot.net 


Host: www.malwa.re 

User-Agent: 680e4a9a 


User-Agent: dae4a661 



Generalizing C&C Protocol 


Request 1: 

GET /Ym90bmq=/cnc.php?v=220&cc=IT 
Host: www.bot.net 
User-Agent: 680e4a9a 


Generalized Request 1: 

GET /<base64,8>/cnc.php?v=<int,3>&cc=<str,2> 
Host: www.bot.net 
User-Agent: <hex,8> 


Request 2: 

GET /bWFsd2F=/cnc.php?v=139&cc=US 
Host: www.malwa.re 
User-Agent: dae4a661 


Generalized Request 2: 

GET /<base64,8>/cnc.php?v=<int,3>&cc=<str,2> 
Host: www.malwa.re 
User-Agent: <hex,8> 



Features - Query Names 


Generalized Request 1: 

GET /<base64,8>/cnc.php?v=<int,3>&cc=<str,2> 
Host: www.bot.net 
User-Agent: <hex,8> 


Generalized Request 2: 

GET /<base64,8>/cnc.php?v=<int,3>&cc=<str,2> 
Host: www.malwa.re 
User-Agent: <hex,8> 



Features - Query Data Types & Lengths 


Generalized Request 1: 

GET /<base64,8>/cnc.php?v=<int,3>&cc=<str,2> 
Host: www.bot.net 
User-Agent: <hex,8> 


Generalized Request 2: 

GET /<base64,8>/cnc.php?v=<int,3>&cc=<str,2> 
Host: www.malwa.re 
User-Agent: <hex,8> 



Features - Path 


Generalized Request 1: 

GET /<base64,8>/cnc.php?v=<int,3>&cc=<str,2> 
Host: www.bot.net 
User-Agent: <hex,8> 


Generalized Request 2: 

GET /<base64,8>/cnc.php?v=<int,3>&cc=<str,2> 
Host: www.malwa.re 
User-Agent: <hex,8> 



Features - Headers 


Generalized Request : 

GET /<base64,8>/cnc. 

L: 

php?v=<int,3>&cc=<str,2> 

Host: www.bot.net 
User-Agent: <hex,8> 


Generalized Request ; 

GET /<base64,8>/cnc. 

2: 

php?v=<int,3>&cc=<str,2> 

Host: www.malwa.re 
User-Agent: <hex,8> 



Features - IP addresses hosting domain 


Generalized Request 1: 

GET /<base64,8>/cnc.php?v=<int,3>&cc=<str,2> 
Host: www.bot.net 
User-Agent: <hex,8> 


Generalized Request 2: 

GET /<base64,8>/cnc.php?v=<int,3>&cc=<str,2> 
Host: www.malwa.re 
User-Agent: <hex,8> 



DGA Feature Engineering 
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Features - n-gram 
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DGA 2 


trigram 
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Features - Entropy 


DGA 1 
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Features - Structural 
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Features - NXDomain/Client 
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Determining important features 


> Try all combination of features ( 2 n ). 

> Forward selection. 


> Backwards selection. 
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Learning: Supervised vs. Unsupervised 


Supervised 



Unsupervised 



Malware-A 



Unsupervised Learning - Clustering HTTP Requests 




Unsupervised Learning - Clustering HTTP Requests 




Unsupervised Learning - Clustering HTTP Requests 




C&C Protocol Detectioi 


> Similarity 

> Measures likeness 

> CPT specific 

> Specificity 

> Measures uniqueness 

> Network specific 


Input: req, CPT 


Similarity: s(req,, CPT,), 
for each component / 

Specificity: a(req,, CPT,), 
for each component / 

Match-Score: /( sim, spec) 

If Match-Score > 0: 
return C&C Request 



Unsupervised Learning - Clustering DGAs 
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Supervised Learning - Modeling DGAs 
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Modeling Tools 


> Scikit-learn 

> Collection of machine learning algorithms (Python). 

> http://scikit-learn.org/stable/ 

> Weka 

> Collection of machine learning algorithms (Java). 

> http://www.cs.waikato.ac.nz/ml/weka/ 

> R 

> Language and environment for statistical computing and graphics. 

> http://www.r-project.org/ 
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Evaluation Data 




C&C Evaluation Deployment Networks 



UNetA 

UNetB 

FNet 

Distinct Src IPs 

7,893 

27,340 

7,091 

HTTP Requests 

34,871,003 

66,298,395 

58,019,718 

Distinct Domains 

149,481 

238,014 

113,778 


♦ Evaluation ran for two weeks. 

♦ CPTs updated daily beginning two weeks prior to evaluation. 
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Network Deployment Results 
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k - fold Cross Validation 
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k - fold Cross Validation 
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k - fold Cross Validation 
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DGA Classifier - 10-fold Cross Validation 


Botnet 

TP Rate 

FP Rate 

Bobax 

99% 

0% 

Conficker 

99% 

0.1% 

Sinowal 

100% 

0% 

Murofet 

99% 

0.2% 

Benign 

99% 

0.1% 



DGA Clustering - ISP Deployment 


> Six confirmed DGA-based malware 

> Six new DGAs for which no malware family (at discovery) 


Malware Family 

First Seen 

Population 
on Discovery 

Shiz/Simda-C [32] 

03/20/11 

37 

Bamital [11] 

04/01/11 

175 

BankPatch [5] 

04/01/11 

28 

Expiro.Z [8] 

04/30/11 

7 

Bonn ana [41] 

08/03/11 

24 

Zeus.v3 [25] 

09/15/11 
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New-DGA-v2 
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New- DG A-v4 

semklcquv juf ayg02orednzdf g . com 
invfgg4szr22sbjbmdqm51pdtf . com 
0 vqbqcuqdvO i 1 f adodtm5 iumye . com 
nplr0vnqjr3vbs3c3iqyuwe3vf . com 
s3fhkbdu4dmc001tmxskleeqrf . com 
gupl iapsm2xiedyef et 2 lsxete . com 
y5rk0hguj fgo0t4sfers2xolte . com 
me5oc lqrfano4 z 0mx4qsbpdufc . com 
jwhnr2uu3 zp0ep4 0cttq3oyeed . com 
ja4baqnv02qoxls jxqrszdziwb . com 


New-DGA-v5 

zpdyaislnu . net 
vvbm j f xpyi . net 
oisbyccilt . net 
vgkblzdsde . net 
bxrvf tzvoc . net 
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dycsmc fwwa . net 
dpwxwmkbx 1 . net 
ttbkuogzum. net 


New-DGA-v6 
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C&C Protocol Detection 
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DGA Detection Deployment 
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Thank You! 
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